-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(snowflake)!: Support transpilation of TO_DOUBLE from snowflake to duckdb #6658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
SQLGlot Integration Test ResultsComparing:
By Dialect
Overallmain: 6173 total, 5610 passed (pass rate: 90.9%), sqlglot version: sqlglot:transpile_TO_DOUBLE_snowflake_duckdb: 6173 total, 5612 passed (pass rate: 90.9%), sqlglot version: Difference: No change |
Should we really be replacing unsupported characters? Would it be more explicit to just fail for those cases? |
To be more specific, although these characters are not supported by DuckDB's REGEXP_REPLACE, they are supported by Snowflake as they are parts of the formatted numeric strings. For example, the '$' and ',' in this string '$1,234.56' are valid if the specified format is '$9,999.99' (TO_DOUBLE('$1,234.56', '$9,999.99')). So in the process of transpilation, we need to remove them |
| value = expression.this | ||
| format_arg = expression.args.get("format") | ||
|
|
||
| if format_arg and isinstance(format_arg, exp.Literal): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this branch handle every format supported by Snowflake? It's quite complicated and so I'd like to understand if the complexity stems from it being a complete solution or just hacking it together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.snowflake.com/en/sql-reference/sql-format-models
This can handle the formats specified under the "Fixed-position numeric formats" section, except for Hexadecimal digit (I couldn't find a working example of it and thus can't test it). These formats are included in the test query shown above.
There are also Text-minimal numeric formats, and as far as I know they are just the default format for DOUBLE and DECIMAL
|
I'm concerned with getting this in, because it feels like we're addressing a small subdomain of the possible model formats and the added complexity to achieve this doesn't look trivial. Let's postpone dealing with this transpilation until we actually need it. |
https://docs.snowflake.com/en/sql-reference/functions/to_double
DuckDB's CAST operation is able to parse strings into numbers, as long as they only contain numeric characters with special characters like '.', '+' or '-'.
For the transpilation of TO_DOUBLE to work, we need to remove unsupported characters from the input strings using REGEXP_REPLACE. Also if the the appears at the end, we need to move it to the front if it's '-' or otherwise ignore it if it's '+'
Test queries
source query
transpiled query:
Both queries produce the same results. It should be noted that with Snowflake CAST(JSON('1.7976931348623157e+308') AS DOUBLE) loses precision and becomes 1.79769313486232e+308